Nothing in SARS-CoV-2 makes sense except in the light of RNA modification?

Tweetable abstract The expression pattern of RNA deaminases determines the mutation and evolution of SARS-CoV-2.

purpose was) are actually dealing with the evolution patterns of RNA deamination sites. This is why nothing in SARS-CoV-2 makes sense except in the light RNA deamination.

Expression pattern of RNA deaminases explains the mutation profile of SARS-CoV-2
To comprehend the evolution process behind the mutation profile of SARS-CoV-2, one should first understand how the RNA deaminases (ADARs and APOBECs) work.
(1) Tissue specificity: ADAR family has three members in humans (ADAR1-3) and all of them have considerable expressions in lungs and nerve systems. APOBECs are encoded by multiple subunits and the expressions of which are also enriched in lungs and nerves [12]. This suggests that when SARS-CoV-2 infects human lungs, it will be highly 'mutable' in lungs due to the tissue-specific expression of RNA deaminases.
(2) Subcellular localization: ADAR2 and ADAR3 are located in nucleus while ADAR1 only has partial expression in cytosol (the p150 isoform) [13]. In contrast, APOBECs have a much wider expression spectrum ranging from nucleus to cytosol as well as other cellular compartments. Since SARS-CoV-2 mainly invades the cytosol, the localization of deaminases successfully explains why C-to-U sites are more prevalent than A-to-I sites in SARS-CoV-2 (although these two deamination types are already much more prevalent than other substitutions).
(3) Sequence preference: ADAR3 is inactive in mammals, ADAR2 preferentially deaminates coding sequences and ADAR1 mainly deaminates noncoding regions [14]. Since most of the SARS-CoV-2 'genome' is coding region, it is intuitive to think that ADAR2 should be the 'chief editor' of A-to-I events in SARS-CoV-2. Unfortunately, the nucleus-located ADAR2 is generally inaccessible to SARS-CoV-2. In contrast, the widely expressed APOBECs do not have the preference on coding/noncoding regions so that the C-to-U sites are catalyzed in an unbiased manner. This dilemma for ADARs again explains why A-to-I substitutions are much fewer than C-to-U substitutions in SARS-CoV-2.
Altogether, the expression patterns of RNA deaminases explain the mutation profile of SARS-CoV-2, and there is no reason to omit this information when one studies the sequence evolution of SARS-CoV-2.

SARS-CoV-2 has the potential to infect human nerve systems, the tissues with abundant RNA deaminases
We have introduced that the expressions of ADARs and APOBECs are highest in lungs and nerve systems. Interestingly, apart from the commonly known infection in lungs, many cases suggest that SARS-CoV-2 could also infect nerve systems, leading to neurological symptoms like encephalopathy and delirium [15,16]. This scenario implies that SARS-CoV-2 seems to selectively (or 'deliberately') infect the tissues with high expression of RNA deaminases, accelerating its own mutation and evolution rate. However, we emphasize that tissue tropism is a separate problem that must not be related to the presence of host deaminases. The observation of SARS-CoV-2 infecting nerve systems might simply be the result of natural selection. One possible evolutionary trajectory is, among all the tissues or cell types, only lungs and nerve systems could force the virus sequence to change rapidly, providing more options for the virus to increase its fitness in hosts. Consequently, only the SARS-CoV-2 strains with adaptive mutations (which are randomly obtained) could survive. Therefore, the selectively maintained viral strains inherited the ability to infect nerve systems.
Without considering the RNA deaminases, one could hardly understand why SARS-CoV-2 infects lungs and nerve systems. Even with the structural evidence that only particular receptors on cell membrane could interact with SARS-CoV-2, this antigen-receptor relationship is highly susceptible to mutations. Natural selection is the only force to create a new host-virus relationship (by positive selection) or to maintain an existing host-virus relationship (by purifying selection).

Parameters like linkage disequilibrium are only meaningful at single molecule level instead of individual host level
Among the numerous literatures on SARS-CoV-2 evolution, one could usually see some traditional analyses on linkage disequilibrium (LD), recombination, Theta-Pi or Tajima's D [17]. Here, in the light of RNA deaminases, we would point out the flaws and paradoxes behind these previous studies.
For humans, each individual is a diploid so that the LD analysis could be performed among a human population. But for SARS-CoV-2, if one intends to perform LD, then how to define 'a virus individual' or a 'haplotype'? The millions of viral sequences isolated from a single host (namely a sequencing library) would be highly polymorphic due to promiscuous deamination by ADARs and APOBECs (a phenomenon termed 'intra-host polymorphisms').
In theory, each RNA molecule should be regarded as 'a virus individual' and the unit of LD should be each 'single sequencing read'. Unfortunately, in most SARS-CoV-2 literature dealing with 'strains' [18,19], one sequencing library from one host is regarded as 'one strain' to perform LD [17] regardless of the intra-host polymorphisms caused by deamination. Moreover, misdefinition of 'a virus individual' also affects the definition of population size N e . It remains debatable whether the N e of SARS-CoV-2 refers to the number of infects humans, infected cells, or the number of SARS-CoV-2 RNA molecules. When RNA deamination and intra-host polymorphisms are considered, it becomes evident that each SARS-CoV-2 molecule should be treated as a virus individual. Likewise, evolutionary analyses on recombination, nucleotide diversity (Theta-Pi), selection strength (Tajima's D) and allele frequency should also take the deamination events into account [20].

Future perspective
In this article, we have introduced that the expression patterns of human RNA deaminases determine the: tissuespecificity of SARS-CoV-2 infection; the abundance of mutations; and the fast evolution of virus. We are also concerned that the previous definition of 'a SARS-CoV-2 strain/haplotype' should be updated due to the high level of intra-host polymorphism.
We appeal that the next-generation sequencing of SARS-CoV-2 isolates needs longer reads that cover the whole RNA in order to determine the linkage of distantly located mutation sites. The allele frequency and diversity analyses should be performed at single molecule level. Moreover, different from the human population genetics, the prevalence of mutations in SARS-CoV-2 is dictated by an additional factor, that is the activity of deaminases in host. Therefore, researchers should seriously consider what the mutation spectrum tells us before casually making a conclusion based on the incomplete observation.